Hi, it's time to look at probability theory again and we're going to start with random
variables, continuous random variables. So the things we're going to talk about today
is usually something that is spread out over the course of a few lectures in a classic
probability course. We will review those topics and my goal here is to give you intuition
if you haven't seen that before or a refresher if you have seen those results before. So
everything today will be about random variables in one dimensional R. Those things generalize
to high dimensional variables but we'll draw everything in 1D. So let's start. We call
X a random variable if it's a mapping from some abstract probability space omega sigma
algebra A and probability measure P to R. So X has values in R. And I'd like you to
forget almost everything about that probability space. This is an important theory but it
will not matter that much in this course. So if you haven't seen sigma algebra before
then don't worry too much about that. And a random variable has a probability density
function PDF, so the abbreviation for that, rho X and the cumulative distribution function
which is CDF FX. And those two are connected via this formula. Probability that X takes
on values less than or equal than R is the CDF at R and can also be evaluated by integrating
the density from minus infinity to R. So let's look at that density. So this is the density
rho X of X, one dimension X and let's say this is R. Then the area underneath the curve
to the left of this value of R, this is the probability that X takes on values less or
equal than R. We will only rarely use this CDF and we'll mostly look at the PDF rho
X but sometimes we have to look at those events and then the CDF will come in handy. But in
general you should think about random variables in terms of the density this rho of X. For
more general events X and A, we can also integrate the density over A. So let's again draw something
like that. So A could also be a union of intervals or something more complicated. And then probability
that X is in those two sets, this is A, is then the area under the graph of the density.
That's probability. So one thing obviously you have to have is that the density rho X
is integrated to one over the whole domain. So that's something that has to hold. So integral
over all of R of the density dx is equal to one. The expectation of a random variable
is given by this quantity. It is essentially just integration over the density but we have
this this factor of X here. And in my opinion the best way to think about why this is correct
is the following. So let's approximate that by a discrete sum. So let's say this is roughly
sum of Xk times rho X of Xk times delta Xk of course. Now let's think about that. So
what does that mean? Let's say this is the density, maybe here is zero. What that means
is the expectation is summing up all position vectors. So here is Xk and actually this vector
is the vector Xk. And we sum up all those Xk for various values of k. So this is Xl
and we weight this vector by the height of the density at that point. So what that means
is this position vector here, this Xk gets a really strong contribution. This position
vector here gets slightly smaller contribution. And if we take something far out here, then
this gets only a very weak contribution. So this is the center of mass in some sense.
We take an average, this sums up to one, so it's a weighted average of position vectors
and points with a higher density are weighted more strongly. So this is a really good picture
for the expectation. It's essentially a center of mass or an average, a mean where the probability
mass is concentrated in average. And the variance is a measure for how far it's spread out.
So we do something similar here. We just measure deviations from the mean and weight it by
the probability density at that point. But I'm sure you have seen mean and variance at
some point before. So let's continue. When we do Bayesian inference, we often have two
different things. So maybe a parameter and observable. So we will definitely have to
work with two random variables at once. So we will have to have a concept of two variables
being jointly distributed. And if let's look at this formula first. So the probability
that X and Y are in some two dimensional set A is the integral of the joint density over
this domain A, which makes sense because that's 2D generalization of what we just said in
Presenters
Zugänglich über
Offener Zugang
Dauer
00:27:56 Min
Aufnahmedatum
2020-05-14
Hochgeladen am
2020-05-14 23:36:20
Sprache
en-US